CD8 TIL projection uncover cells types hidden by transient gene programs

Inteferon, Cell-Cycle and Tissue-residency gene programs

Background

Cells responding to external cues can overexpress transient gene programs. In scRNA-seq data, clusters will be driven by the strongest variation component, which can be sometimes transient gene programs. These programs such as the cell cycle, might hide the original, potentially more stable cell type beneath.

These are among the main types of program that classically are being upregulated in scRNA-seq data:

  • Cell cycle, when cells go through division (eg. MCM5, TOP2A)

  • IFN response, in response to IFN stimulation (eg. MX1, ISG15)

  • HSP response, in response to stress (eg. HSPA1, HSPA2)

  • Activation response, in response to activation signal, eg TCR engagement (eg. JUN, FOS)

  • Resident program, when cell infiltrate and home within tissues (eg. ZNF683, ITGAE)

Note: while some programs are clearly transient (like the cell cycle), some programs might be more permanant / epigenetically fixed (such as the resident program). We argue here than having a well-curated reference, free from many transient gene programs, will help in distinguishing transient from more stable cell states.

Here we will see how we can use reference-based projection to uncover the original cell types from IFN-activated, Cycling and Resident-memory cells using data from Gueguen et al. 2021.

In this study, the authors describe population of:

  • IFN-activated cells: CD4/8-ISG15
  • Cycling cells: CD4/8-MCM5 and CD4/8-TOP2A.
  • Resident cells: CD8-XCL1 , CD8-GZMH and CD8-LAYN.

As these cycling cells are a mix of CD4 and CD8 T cells, we will see first how we can isolate only CD8 from this mix of CD8/CD4 using the built-in scGate filtering included in the CD8 reference. We will then see if our reference-based approach ProjecTILs can help recover the original cell types annotation.

Data setup

Loading the CD8 reference

First let’s load the CD8 TIL reference by downloading it from Figshare.

# Load the reference
options(timeout = max(900, getOption("timeout")))
#download.file("https://figshare.com/ndownloader/files/38921366", destfile = "CD8T_human_ref_v1.rds")
ref.cd8 <- load.reference.map("CD8T_human_ref_v1.rds")
## [1] "Loading Custom Reference Atlas..."
## [1] "Loaded Custom Reference map Human CD8 TILs"
# Setup colors
mycols <- ref.cd8@misc$atlas.palette

# Plotting
DimPlot(ref.cd8,  group.by = 'functional.cluster', label = T, repel = T, cols = mycols) + theme(aspect.ratio = 1)

We can see that the reference is composed only of what should only be stable cell types. Indeed, cells expressing highly transient gene programs (Cycling and IFN) were removed when constructing the reference.

Load Gueguen et al. data

#download.file("https://figshare.com/ndownloader/files/39082049", destfile = "gueguen.cd3.Rds")
gueguen.cd3 <- readRDS("gueguen.cd3.Rds")
gueguen.cd3$seurat_clusters <- Idents(gueguen.cd3)

# DimPlot
DimPlot(gueguen.cd3, order = T,  label = T, repel = T) 

Projection using reference map

# Projection
DefaultAssay(gueguen.cd3) <- "RNA"
gueguen.cd3 <- ProjecTILs.classifier(gueguen.cd3, ref = ref.cd8, filter.cells = T, split.by = 'orig.ident', ncores = 6)
table(gueguen.cd3$functional.cluster)
## 
##        CD8.CM        CD8.EM      CD8.MAIT CD8.NaiveLike     CD8.TEMRA 
##          2132          3181           147           238           276 
##       CD8.TEX      CD8.TPEX 
##          3340           337
# Plotting
DimPlot(gueguen.cd3, group.by = 'functional.cluster', order = T, cols = mycols, label = T, repel = T)

We can chose to isolate specific clusters, or directly isolate cells by which gene programs / signature they express. First, let’s see focus on the cluster identified by their program.

1-IFN-activated cells: CD4/8-ISG15 cluster

gueguen.ifn <- subset(gueguen.cd3, subset = seurat_clusters %in% c('CD4/8-ISG15'))

# Dimplot
DimPlot(gueguen.ifn, group.by = 'functional.cluster', label = T, repel = T, cols = mycols)

# barplot 
plot.statepred.composition(ref.cd8, gueguen.ifn, metric = "Percent") +
    ggtitle("IFN-stimulated cells") + ylim(0, 75) + theme_bw() + RotatedAxis()

IFN-stimulated cells are mainly comprising CD8.EM, CD8.TEX and CD8.CM.

# Radar plots
p <- plot.states.radar(ref.cd8, query = gueguen.ifn, min.cells = 10, genes4radar = c('LEF1', "TCF7", "CCR7", "GZMK", "FGFBP2",'FCGR3A','ZNF683','ITGAE', "CRTAM", "CD200",'GNG4', "HAVCR2", "TOX", "ENTPD1", 'TYROBP','KIR2DL1', "ISG15","MX1","IFIT1"), return = T) 
wrap_plots(p) + theme_bw()

The radar show good matching between the reference and the IFN-stimualted query, but differ for the IFN genes that we selected (ISG15, MX1 and IFIT1).

2-Cycling cells: CD4/8-MCM5 and CD4/8-TOP2A clusters

gueguen.cycling <- subset(gueguen.cd3, subset = seurat_clusters %in% c('CD4/8-TOP2A', "CD4/8-MCM5"))

# Dimplot
DimPlot(gueguen.cycling, group.by = 'functional.cluster', label = T, repel = T, cols = mycols)
## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

# barplot 
plot.statepred.composition(ref.cd8, gueguen.cycling, metric = "Percent") +
    ggtitle("Cycling cells") + ylim(0, 75) + theme_bw() + RotatedAxis()

We can see that we can recover the original phenotype of the cycling cells, including Effector Memory (EM), Central Memory (CM), Exhausted (TEX), and Progenitor Exhausted (TPEX). Let’s check the radar plots to see if they match the reference.

# Radar plots
plot.states.radar(ref.cd8, query = gueguen.cycling, min.cells = 10, genes4radar = c('LEF1', "TCF7", "CCR7", "GZMK", "FGFBP2",'FCGR3A','ZNF683','ITGAE', "CRTAM", "CD200",'GNG4', "HAVCR2", "TOX", "ENTPD1", 'TYROBP','KIR2DL1',"TOP2A","MCM5","PCNA")) 

Indeed, the radars confirm that the annotation seems of good quality. Radar are good way to show that phenotypes match between the reference and the query, but they differ by the cycling genes (TOP2A, MCM5 and PCNA).

3-Resident cells: CD8-XCL1 , CD8-GZMH and CD8-LAYN

gueguen.resident <- subset(gueguen.cd3, subset = seurat_clusters %in% c('CD8-XCL1', "CD8-GZMH","CD8-LAYN"))

# Dimplot
DimPlot(gueguen.resident, group.by = 'functional.cluster', label = T, repel = T, cols = mycols)

# barplot 
plot.statepred.composition(ref.cd8, gueguen.resident, metric = "Percent") +
    ggtitle("Resident cells") + ylim(0, 75) + theme_bw() + RotatedAxis()

We can see that we can recover the original phenotype of the resident cells, including Effector Memory (EM), Central Memory (CM), Exhausted (TEX), and Progenitor Exhausted (TPEX). Let’s check the radar plots to see if they match the reference.

# Radar plots
plot.states.radar(ref.cd8, query = gueguen.resident, min.cells = 10, genes4radar = c('LEF1', "TCF7", "CCR7", "GZMK", "FGFBP2",'FCGR3A','ITGAE', "CRTAM", "CD200",'GNG4', "HAVCR2", "TOX", "ENTPD1", 'TYROBP','KIR2DL1',"ZNF683","ITGA1","ITGAE")) 

Indeed, the radars confirm that the annotation seems of good quality. Radar are good way to show that phenotypes match between the reference and the query, but they differ by the resident genes (ZNF683, ITGA1, ITGAE and CXCR6).

UCell to further increase granularity of ProjecTils classifications based on gene programs

Instead of taking unsupervised clusters, we can also divide the dataset by computing signatures scores. We now want to discriminate further our ProjecTILs classification using signatures. To this end, we can use our collection of useful transcriptional gene signatures: SignatuR, as well as manually curated signatures to compute module score or signature scores using UCell.

We will generate signatures for: - G1S cell cycle - G2M cell cycle - IFN response - Tissue resident cells

library(SignatuR)
signatures <- GetSignature(SignatuR$Hs)
signatures <- signatures[c("cellCycle.G1S","cellCycle.G2M")]

# Using manual signatures for IFN and resident, which work better with UCell than the full signatures from SignatuR package
signatures[["IFN"]] <- c("ISG15","IFI6","IFI44L","MX1")
signatures[["Tissue_Resident"]] <-  c("ITGAE")

Now let’s add the scores using UCell.

gueguen.cd3 <- AddModuleScore_UCell(gueguen.cd3, features = signatures, ncores = 8,
                       assay = "RNA")
Should I use unsupervised clusters or UCell scores to measure gene programs?

It all depends on the dataset that you have. If you have annotated your dataset with ProjecTILs, you won’t have any cluster driven by transient programs. In this case, you can use UCell signature scoring. Else, if you have clusters clearly driven by transient gene programs, you can use ProjecTILs mapping to unravel the underlying cell types within these clusters of interest.

FeaturePlot(gueguen.cd3, features = paste0(names(signatures), "_UCell"),
    order = T, pt.size = 0.3) & scale_color_paletteer_c("pals::coolwarm") 
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.

We can see that these signatures correlate with their annotation of the original paper.

Now, let’s check signatures scores distribution to decide which threshold put to distinguish negative/positive cells.

qplot(gueguen.cd3$cellCycle.G1S_UCell, main = "G1S-Cycling") +
    qplot(gueguen.cd3$cellCycle.G2M_UCell, main = "G2M-Cycling") + qplot(gueguen.cd3$IFN_UCell, main = "IFN") + qplot(gueguen.cd3$Tissue_Resident_UCell, main = "Tissue Resident") & theme_bw()

# subset resident high cells in the CD8 reference
resident.high <- subset(gueguen.cd3, subset = Tissue_Resident_UCell > 0.2)
resident.low <- subset(gueguen.cd3, subset = Tissue_Resident_UCell <= 0.2)

# Plotting
DimPlot(resident.high) + ggtitle("Resident-high cells") & DimPlot(resident.low)  + ggtitle("Resident-low cells") & NoLegend()

# Set list
resident.list <- list(resident.low,resident.high)
names(resident.list) <- c("resident.low","resident.high")
# Radars to check differences between resident / non resident
plot.states.radar(ref = ref.cd8, query = resident.list, labels.col = 'functional.cluster', min.cells = 5, genes4radar = c("TCF7","CCR7","KLF2","GZMK","LMNA","FCGR3A","FGFBP2","XCL1","KLRB1", 'ZNF683',"ITGAE", 'ITGA1')) 

Resident-high and resident-low cells display the same overall phenotypes, but differ in their expression of key resident genes, upregulated when cells home whithin tissue, such as integrins (ITGAE, ITGA1) or the transcription factor HOBIT (ZNF683).

By looking at the UCell scores distribution let’s categorise the cells by putting a UCell score threshold of 0.2.

wrap_plots(pll, guides = "collect")

As previously shown, CD8.TEX and CD8.TPEX are the most cycling, resident and IFN-responding subsets. For the IFN signal, results are inconsistent between UCell signature scores and projected clustes. Results will vary depending on the signatures chosen, the UCell threshold chosen, as well as the clustering resolution.